On Spectral Learning of Mixtures of Distributions

نویسندگان

  • Dimitris Achlioptas
  • Frank McSherry
چکیده

We consider the problem of learning mixtures of distributions via spectral methods and derive a tight characterization of when such methods are useful. Specifically, given a mixture-sample, let μi, Ci, wi denote the empirical mean, covariance matrix, and mixing weight of the i-th component. We prove that a very simple algorithm, namely spectral projection followed by single-linkage clustering, properly classifies every point in the sample when each μi is separated from all μj by ‖Ci‖2(1/wi+1/wj) plus a term that depends on the concentration properties of the distributions in the mixture. This second term is very small for many distributions, including Gaussians, Log-concave, and many others. As a result, we get the best known bounds for learning mixtures of arbitrary Gaussians in terms of the required mean separation. On the other hand, we prove that given any k means μi and mixing weights wi, there are (many) sets of matrices Ci such that each μi is separated from all μj by ‖Ci‖2(1/wi + 1/wj), but applying spectral projection to the corresponding Gaussian mixture causes it to collapse completely, i.e., all means and covariance matrices in the projected mixture are identical.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Beyond Gaussians: Spectral Methods for Learning Mixtures of Heavy-Tailed Distributions

We study the problem of learning mixtures of distributions, a natural formalization of clustering. A mixture of distributions is a collection of distributions D = {D1, . . . , DT } and weights w1, . . . , wT . A sample from a mixture is drawn by selecting Di with probability wi and then selecting a sample from Di. The goal, in learning a mixture, is to learn the parameters of the distributions ...

متن کامل

Learning Mixtures of Discrete Product Distributions using Spectral Decompositions

We study the problem of learning a distribution from samples, when the underlying distribution is a mixture of product distributions over discrete domains. This problem is motivated by several practical applications such as crowdsourcing, recommendation systems, and learning Boolean functions. The existing solutions either heavily rely on the fact that the number of mixtures is finite or have s...

متن کامل

A Spectral Algorithm for Learning Mixtures of Distributions

We show that a simple spectral algorithm for learning a mixture of k spherical Gaussians in Rn works remarkably well — it succeeds in identifying the Gaussians assuming essentially the minimum possible separation between their centers that keeps them unique (solving an open problem of [1]). The sample complexity and running time are polynomial in both n and k. The algorithm also works for the m...

متن کامل

The Spectral Method for General Mixture Models

We present an algorithm for learning a mixture of distributions based on spectral projection. We prove a general property of spectral projection for arbitrary mixtures and show that the resulting algorithm is efficient when the components of the mixture are logconcave distributions in n whose means are separated. The separation required grows with k, the number of components, and logn. This is ...

متن کامل

Standard Addition Connected to Selective Zone Discovering for Quantification in the Unknown Mixtures

Univariate calibration method is a simple, cheap and easy to use procedure in analytical chemistry. A univariate analysis will be successful if a selective signal can be found for the analyte(s). In this work, two simple ways were used to find the selective signals, spectral ratio plot (SRP) and loading plot (LP). Both of them were able to discover the selective regions in the recorded data set...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005